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NONANTICIPATING ESTIMATION APPLIED TO SEQUENTIAL 
ANALYSIS AND CHANGEPOINT DETECTION 

By Gary Lorden^ and Moshe Pollak^ 
California Institute of Technology and Hebrew University of Jerusalem 

Suppose a process yields independent observations whose distri- 
butions belong to a family parameterized by 6 G O. When the process 
is in control, the observations are i.i.d. with a known parameter value 
9o. When the process is out of control, the parameter changes. We 
apply an idea of Robbins and Siegmund [Proc. Sixth Berkeley Symp. 
Math. Statist. Probab. 4 (1972) 37-41] to construct a class of sequen- 
tial tests and detection schemes whereby the unknown post-change 
parameters are estimated. This approach is especially useful in situa- 
tions where the parametric space is intricate and mixture-type rules 
are operationally or conceptually difficult to formulate. We exemplify 
our approach by applying it to the problem of detecting a change in 
the shape parameter of a Gamma distribution, in both a univariate 
and a multivariate setting. 

1. Introduction. In all but the simplest cases, the problem of detect- 
ing a change involves at least one unknown post-change parameter. In the 
well-known Shiryayev-Roberts detection scheme [11, 12], a change from pa- 
rameter value 9 = 9q (possibly multidimensional) to 6 = 6i, say, in the dis- 
tribution of a sequence of i.i.d. observations Xi, X2,... is detected by a 
stopping rule 

Na = nim{n\Rn > A}, 
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where 

Rn= ^ ^n,k 
k=l,...,n 

and 

Kk= n feAXi)/feo{Xi). 

i=k,...,n 

When the post-change 9i is not known and it is desired to respond quickly 
to a broad range of possible values, the Shiryayev-Roberts (SR) rule is in 
principle easy to modify: just introduce a mixing measure X{9) and define 

As is well known [5] , this approach preserves the martingale property of the 
sequence {Rn — n} under the "no change" probability measure, Poo, so that 

E^Na = E^RNa > A, 

a useful lower bound on the average run length (ARL) to false alarm. More- 
over, it is typically true that [5] 

E^NA/A^l/-i as^^oo, 

where 7 can be either evaluated by renewal-theoretic methods or simulated, 
which suggests using the approximation 

E^Na « Ah. 

In practice, however, it is usually difficult to carry out the computation 
of A^^fc's unless the mixing measure A can be chosen as a natural conjugate 
prior. Moreover, in many cases, particularly when is multidimensional, it 
is conceptually difficult to make natural choices of A. The present paper 
suggests an alternative approach, based upon defining 

i=k,...,n 

where 6i^k is an estimator of 6 based upon X^, ■ ■ ■ ,Xi-i. The same idea 
appears in [9] in the context of sequential hypothesis testing and is applied 
to the changepoint problem in [1, 3]. By requiring O^^k not to depend on 
Xn, one preserves the martingale property of — n} and similarly the 
upper bound on significance levels that Robbins and Siegmund rely upon. 
The potential advantage of this approach in complicated settings is that 
simple estimators based on the method of moments or maximum likelihood 
are usually much easier to choose than appropriate mixtures, as well as 
substantially faster to compute. 
Moreover, in many cases: 
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1. The asymptotic "overshoot correction," 7, is finite and can be evaluated 
readily by simulating Robbins-Siegmund-type hypothesis tests rather 
than the changepoint detection rules themselves. 

2. The proposed Shiryayev-Roberts-Robbins-Siegmund (SRRS) detection 
rules have reasonably good efficiency, that is, short post-change delays to 
detection, nearly as good as mixture rules. 

Since the overshoot constant 7 is most easily evaluated in the context of 
hypothesis testing. Section 2 and part of Section 3 are devoted to problems 
of testing. Proofs of the asymptotic analysis of the operating characteristics 
of the SRRS rules involve formidable calculations. Therefore, our approach 
in the present paper is to illustrate the arguments in special cases, the test- 
ing of hypotheses about the mean of a normal distribution (Section 2) and 
hypothesis testing and changepoint detection for the shape parameter, 9, of 
a Gamma distribution (Sections 3-5). We believe that these special cases 
provide a good introduction to the type of argument that will work in many 
other contexts. 

Section 2 illustrates the pattern of the asymptotic behavior of the estima- 
tor sequence and the determination of 7. It turns out that there is a natural 
correspondence between choices of an estimator sequence and a choice of 
mixture A, suggesting that, at least asymptotically, the two methods have a 
natural relationship. In Sections 3 and 5 we give asymptotic results for the 
Gamma shape example, showing in particular that the asymptotic efficiency 
of the estimator sequence used in the SRRS scheme determines the coeffi- 
cient of the second-order term in the asymptotic expansion for the expected 
delay to detection. In particular, an asymptotically efficient sequence of esti- 
mators yields a second-order asymptotically optimal detection scheme. (For 
comparison, Dragalin's [1] scheme does not achieve this.) Section 4 gives 
results of Monte Carlo simulations of the performance of the SRRS scheme 
using the method-of-moments and maximum likelihood estimators of the 
Gamma shape parameter, as well as comparisons with other changepoint 
detection rules. Section 5 illustrates the application of the SRRS approach 
to multiparameter problems. 

2. A first example: tests for the value of a normal mean. Let Xi,X2,... 
be a sequence of independent A^(/U, l)-distributed random variables, and 
suppose one is interested in a power one a-level test of Hq: fi = ver- 
sus Hi: fj, j^O. Robbins and Siegmund [9] introduced the following sequen- 
tial test: Let /i„ be an Fn-i = -F(Xi, . . . , X„_i)-measurable estimate of fi 
(where Fq is the trivial cr-field), define A„ = ew{J2i=i n(/^«^« ~ (^i)^/^)}) 
Tfc = min{n|n > 1, A„ > exp{6}} (r^, = c« if no such n exists); reject Hq if and 
only if r& < 00. By using the martingale property of A„ under Hq, Robbins 
and Siegmund showed that a = PnoiTb < 00) < exp{— 6}. In this section, we 
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will give an approximation for a for a special case of the sequence {fJ-n} 
when b is large. 

Let fii = s/t (= if s = t = 0) and /U„+i = {nXn + s)/(n + 1), where Xn = 
X]i=i,...,n ^j/'^ — oo < s < oo, 0<t<ooors = t = are constants. (This 
would be a natural estimate of ji after n observations on test when prior to 
testing there is a learning sample of t observations whose sum is s; it is also 
a way of incorporating a prior distribution into the testing scheme.) In other 
words, after every observation we update our estimate of /x, and exp{/i„X„ — 
(A*n)^/2} is the estimated likelihood ratio for the nth observation. 

Let Gs,t be the A^(sA, Ei=i,...,oo + *)^) cd.f. (where s/t = if s = t = 
0). Let /ii = s/t, v^{t) = Ei=i '...,'oo + t? and 

(1) K/«) = 2/i-'exp|-2 n-ic^(-i|/i|V^)l. 

I 71=1, ...,00 ) 

Theorem 1. As oo, when /.ii = s/t (=Q if s = t = 0) and ^in+i = 
{nXn + s)/{n + t), the Robbins-Siegmund power one test of Hq : = versus 
Hi : /i 7^ has significance level 

a = PHoin < oo) = (1 + o(l))7exp{-6}, 

(2) 

^ ' poo 

where 7 = / u{y) dGs^tiv)- 



Remark. A theorem analogous to Theorem 1 can be formulated for 
an arbitrary sequence {/^n}- Its practical value is usually as a statement of 
existence of a limit, which one can evaluate by simulation. The analog to 
Gs,t is generally very hard to compute. 

Proof of Theorem 1. Let Q be the measure on {Xi,X2, . . .] un- 
der which the distribution of X„ conditional on Xi, . . . ,Xn-i is N{fin, 1); 
n = 1,2, ... . [By abuse of notation, we will let Q(Xi, . . . , Xn) denote the 
distribution of Xi, . . . , Xn under Q.] Let Pq and Eq denote probability and 
expectation, respectively, under the measure Q. The proof requires two lem- 
mas. 

Lemma 1. Under the measure Q, the sequence {nn} converges a.s. to a 
random variable whose distribution is Gs,t- 

Proof. Write Xn = Hn + Zn where Zi ~ A^(0, 1) and are independent. 
Thus, for n > 2 



fin+l =1 E Xi + s\/{n + t) 

\i=l,...,7i / 
= {{n-l + t)Hn + + Zn)/{n + t) = ^n + Zn/ (n + t). 
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Therefore 



j=l,...,n— 1 



Hence converges a.s. as n ^ cxd to /.ii + X]i=i,...,oo -^i/l^ + whose distri- 
bution is Gs^t- n 

Lemma 2. -PQ(Tb < cx)) = 1. 

Proof. By virtue of Lemma 1, J2i=i nif^i^i ~ — > oo a.s. (Q) 
as n ^ oo, from which Lemma 2 foUows. □ 

Proof of Theorem 1 — continued. Let A„ = A„(Xi, . . . = exp{J2i=i,...,n{f^i^i 
{lJ.if/2)}. Since Pq(t6 < oo) = 1, 

a = Pho (rfe < oo) = ^ / • • • / /^o (a^i, • • • , a^n) • • • dx„ 

= exp{-6}^ / ••• / exp{-{logAn-b)}dPQiXi,...,Xn) 
= eyip{-b}EQex.p{-{logA-r, - b)}. 



(3) 



Thus what is left to be done is a renewal-theoretic analysis of the expectation 
in (3). 

Let < e < 1. By virtue of Lemma 1, there exist < < < oo such 
that Pgide < < as for ah n > 2) > 1 - e. Note that if y ~ N{fi, 1), 
then suPq>o-P(^ — a > y\Y > a) — > as y ^ oo uniformly for all /i in a 
compact set. Therefore, there exists < < oo such that if 6 > c > c^, then 
PqiAb^c) > 1 - 2e, where Ab,c = {log Ar^^, - {b - c) < c/2}. 

Choose c> Cs and write w = t^^c- Note that X^+j = lJ'w + J2i=o,...,j-i Zw+i/{w + 
i + t) + Zw+j. When j remains fixed and 6 — > oo, then J2i=i,...,j-i Zw+i/iw + 
i) ^ in probability. Leaving c fixed, when fi^ ^ {—dejde), Tb — w = Tb — r^-c 
is stochastically bounded in Q-probability as 6— > oo. 

For ^ / 0, let H^^b = min{n > 1, Ei=i,...,n(M(^i + Z^) - l^) > b}. Note 
that normally distributed variables are strongly nonlattice, so that the con- 
vergence in distribution of J2i=i,...,H , ^f^i^i + A*) ~ — b to its renewal- 
theoretic limit is uniform on compact sets that do not contain zero (see [16]). 
Therefore, for large enough c, for all de < |/u| < a^, on J2i=i,...,H + 
^)-^72)<6-c/2}. 



(4) 



E 



(^exp|-(^ E (M^. + ^)-^V2)-bj| 



< e 
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(z^(/i) of (1) is the renewal-theoretic limit of the expectation; cf. [13]). 
Also, for fixed k 



max 



(M,(X, + ^,)-/«f/2) 

i=w+l,...,w+j 

{fJ'w{Xi+ fly,) - fll/2) 

i=w+l,...,w+j 







as 6 ^ oo. 



Therefore, and because of the stochastic bomidedness of Tb — w, for all large 
enough b, on Ab^c n {ds <\f-iw\ < o-s} 



Eq {exp{- {log Ar^ - b)}\Fw;fiw = y;logA^„ = z) 



(5) 



-El exp 



Y {y{Z, + y)-y'/2)-b]\ 
Y {y{Z, + y)-y'/2) = z) 

=l,...,w ) 



< e. 



Combining (4) and (5), one obtains that c can be fixed so that for all large 
enough 6, on A^^c H {de < < a^} 

|^Q(exp{-(logA^, - h)}\F^) - v{ii^)\ < 2e. 

Since the distribution of /i^ converges as 6 — > cxo to G^^t, it follows that one 
can fix c so that there exists h^^c such that for all h > h^^c 



Eq fexp{-(log a,, - b)] - r v{y) dGsAv) 
Letting e ^ concludes the proof of Theorem 1. □ 



<6e. 



As the preceding analysis suggests, every stopping rule of the Robbins- 
Siegmund type has a mixture analog. For example, the mixture-type analog 
of the rule described in Theorem 1 is 



Tb = min< n 



mm< n 



exp< y 



expl^(^s/t + v\t) J2 /[2v\t){nv\t) + l)]^ 

X {nv'^{t) + iy^^^>exp{b}\ 
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and [2] the asymptotic expression for its level of significance Pff„(Tfc < oo) 
is the same as (2). For a given level of significance a, both rules have (ap- 
proximately) the same stopping threshold exp{6}. 

Robbins and Siegmund [10] noted that in the continuous-time (Brown- 
ian motion) case the two rules are identical. In the discrete-time case a 
comparison between them is of interest. Following the methods of Pollak 
and Siegmund [6], Robbins and Siegmund [10], Lai and Siegmund [2] and 
Woodroofe [17], one can show that, for any fixed fj,^0, the difference be- 
tween the expected sample sizes of the two stopping rules is 0(1) as a — > 0. 
Specifically, letting rt = lim„^oo(Z]i=i,...,n + ~ logn), it can be shown 
that 



1^ 



n 



t E l/{j+tf-2log{ E l/(j + tf)+l 



{^,-s/tf 



+ o{l) 

'':2.'^-^{g(t) + {fi-s/t)^h{t)} + 0{l). 

Tedious calculations show that g{t) > and h{t) > for all t > 0. Thus the 
mixture rule studied in this section is asymptotically uniformly (in /x) better 
(by at most an additive constant) than its Robbins-Siegmund analog. This 
extends a result of Pollak and Yakir [8] . 



3. A second example: hypothesis testing and detecting a change of the 
shape parameter of a Gamma distribution. As indicated in Section 2, 
a mixture procedure seems preferable for the normal mean problem. How- 
ever, there are cases where mixtures are hard to apply, such as distributions 
that do not admit a conjugate prior, especially when the parameter is multi- 
dimensional. In such Robbins-Siegmund scheme would be of value. 
In this section, we illustrate this by setting up a power one test and a change- 
point detection scheme for the shape parameter of a Gamma distribution. 
The considerations involved are typical of more complex problems. 

A power one test. Let Xi,X2,... be i.i.d. Gamma(^, l)-distributed ran- 
dom variables, and let Hq ■.6 = 6q, Hi:9 where < < oo is fixed. This 
is an example where there is no "natural" mixture; the Gamma(0, 1) family 
has no conjugate prior. 

Following the considerations of Section 2, we need to define a sequence 
of estimates of 6 that are F„_i-measurable. A comparison of estimators will 
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be made in Section 4. In this section we consider a particularly simple yet 
flexible approach based upon the method of moments. Let < s,t < oo and 
define 9i = s/t {=6oifsAt = 0) and 9n+i = {nXn + s)/(n + t) for n > 1. 
Define a Robbins-Siegmund type of test with 

n 

An = n[^^~'°r(^o)/r(e.)] 

i=l 

as its test statistic and 

Tb = min{n|n > 1, A„ > e^} 

(rft = cxD if no such n exists) as its stopping time. For 9 ^6q let 

r Ml, 



r]{9) = lim Eq expi 



Y,\og{fe{Z,)/fg,{Z,))-b 



li=l 



where Zi,Z2,... are i.i.d. Gamma(0, l)-distributed random variables, fg is 
the Gamma(6', 1) density and M;, = min{n|n > l,Ei=i n log(/e(^i)//eo (^«)) ^ 
h}. 



Theorem 2. When 9i = s/t (= 9q if s A t = 0) and 9n+i = {nXn + 
s)/{n + t) for n > 1, there exists a probability measure G (that depends on 
9q, s, t) on (0, oo) such that the Robbins-Siegmund test of Hq ■.9 = 9q, Hi : 9 ^ 
9q has significance level 

(6) Q = Pffo(rfe<oo) = (l + o(l)) X7X exp{-6}, 

where 7 = J i]{y) dG{y) and o(l) ^0 as b ^ co. The test has power one. 

Remark. Although the constant 7 and the measure G do not admit an 
analytic expression, they can be evaluated easily by Monte Carlo, as will 
be shown in Section 4. This turns Theorem 2 into a practical tool, as the 
significance level can be approximately regulated [by letting b = log(7/a)] 
once 7 has been evaluated. The choice of s, t influences the ASN when 9 ^ 9q, 
as discussed in Section 4. 



Sketch of proof of Theorem 2. Under Pg, clearly 6'„, 9 a.s. as 
n ^ CO, so the test has power one, as can be seen easily by the methods of 
Robbins and Siegmund [9]. 

The proof of (6) goes along the same lines as that of Theorem 1. Let Q be 
the measure on {Xi , X2 , • • • } under which the distribution of Xn conditional 
on Xi, . . . , Xn-i is Gamma(^nj !)• We will prove an analog of Lemma 1. The 
rest of the proof of (6) is essentially the same as that of Theorem 1, so we 
omit the details. 
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Lemma 1*. Under the measure Q, the sequence {On} converges a.s. to 
a positive random variable whose distribution is G. 

Proof. By direct calculation, note that under Q the sequence {On} 
is a martingale with expectation Oi = s/t {= Oq if s A t = 0). Therefore 
EQ{ex.p{-en+i}\Fn-i) > exp{-£;Q(6'.„+i|F„_i)} = exp{-6l„}. Thus exp{-6'n} 
is a bounded submartingale under Q and therefore it has an a.s. limit. Con- 
sequently, {9n} has an a.s. limit. It is left to prove that this limit is a.s. 
positive and finite. (If the limit were concentrated on and oo, then Theo- 
rem 2 would not be of practical value.) 

Note that 

VarQ On = Varg ^Q(e„|F„_2) + VarQ(0„|F„_2) 
= Varg On^i + EqOn^i/it + n-lf 
= Varg ^„_i + ^i/(t + n- 1)2, 



so that 



Therefore 



Varg On = E(VarQ Oi - Varg O^.i) < Y)fii/{t + if' 



so that the limiting distribution of On does not have an atom at oo. 

Now consider (pn{y) = Eq exp{—y6n} for y > 0. To show that the limit- 
ing distribution of 0„ does not have an atom at 0, it suffices to show that 
[lim„_>oo '^n(y)] ^ as y ^ oo. Each (fniv) is decreasing in y and by the 
bounded convergence theorem is seen to be continuous and to have limit 
zero at +oo. Denote the inverse function by ^Pn^- If < e < 1 and the se- 
quence {^n^{£)}n=i,...,oo is bounded above by M (say), then, for y > M, 
limn->oo V'n (y) < liiiin->oo ¥'n(¥'n"^(^)) = ^- It therefore suffices to show that, 
for each < e < 1, {(/?~^(e)} is bounded above. 

Recalling that £'(exp{— r x Gamma(0, 1)}) = (1 + r)~^, 



EQEQ(^expi^-y(^s+ ^ X, + /(t + n)| 



En-l 



= EQ{exp{-y^n(.t + n- l)/{t + n)-9n log(l + y/{t + n))}) 
= ifnivit + n- l)/{t + n) + log(l + y/{t + n))). 
Defining y„ = ip~^{e) and setting y = yn+i in the previous line yields 

^n{yn)=£ = ^n+liVn+l) 

= ipn{yn+i{t + n - l)/(t + n) + log(l + yn+i/{t + n))) 
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and therefore 

Vn = Un+iit + n- l)/{t + n) + log(l + Vn+i/it + n)). 
It remains to show that {yn} is bounded above. Letting 7n+i = yn+i/{t + n), 

(7) 7n = 7n+l + (log(l + 7n+l))/(i + ^ - 1). 

Clearly, {7n.} is decreasing and must have a limit q>0. Hence 

g-72= (7n+i-7n) = - (log(l +7„,+i))/(t + n- 1) > -oo. 

n=2,...,oo n=2,...,oo 

Since log(l + 7n+i) log(l + q), evidently q = 0. Thus 

72= E (log(l+7n+i))/(i + n-l), 

71=2,. ..,oo 

and since log(l + 7„+i) > (l/2)7„+i for large n, 

(8) Y ln+i/{t + n-l)<(X). 

n=2,...,oo 

For all n 

log(l + 7„+i) > 7„+i - 7^+1, 

and by (7) 

Multiplying by t + n — 1 , 

Vn > Vn+l - 7n+l = Vn+lil - ln+l/{t + n)), 

SO that for n> k sufficiently large the right-hand side is positive and 

yk/ym+i= n yn/yn+i> Yl (i -7n+i/(i + 

n=k,...,m n=k,...,m 

> n {l-ln+i/{t + n))>0 

n=fc,...,oo 

by (8). Thus ym+i is bounded above, as required. □ 

Theorem 3. Define the Fisher and Kullback-Leibler information num- 
bers 

I{9) = -Ee[dHlogfe{X))/d9']=d-'{\ogT{e))/d0^ 
liOA) = Ee\og[fe{x)/f^{x)] = {6- q^) d{logr (9)) / dO - log[r(0)/r(0)]. 



7„ > -fn+l + (7n+l - 7n+l)/(* + « - 1) 

: 7n+i(t + n)/{t + n - 1) - -/l+J{t + n - 1). 
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Let {On} be a sequence of Fn^i-measurable estimators of with asymptotic 
efficiency k{6) in the sense that 

Ee{On - of = (1 + o(l)) /[nI{0)K{0)] asn^oo for all 0. 

Assume that 

Ee{On-0t = O{n-^) asn^oo 
and that there exists c > depending on such that 

Ee[ilog0-Yl{0n<c)]<oo. 

n=l,...,oo 

Let Th be defined by the estimator sequence {On}- Then for all 0, as oo 
Een = I{0, OorHb + (log b))/i2Kie)) + o(log b). 

Sketch of proof. To make the writing easier, use as a shorthand 
for Tfj. Standard estimates show that 

EelogAN = b + 0{l), 

and letting = /^(Xi, . . . ,X„,)//6)(j(Xi, . . . it follows using Wald's 

equation that 

1(0, Oo)EeN = b + 0(1) + Eeilog A% - log An). 

Applying the martingale optional sampling theorem to {logA^ — logA„ — 
J2k=i ...nH^^^k)}, it remains to show that 

Eef I{O,Ok)]={\ogb)/{2K{0)) + o{\ogb). 

\k=l,...,N ) 

Fix < c < ^. To verify that there is an A > such that for all 
\1[0, 0) - i(<A - Ofl(0)\ <A\4>-0f-, (log r ')+!(</' < c), 

note the following: the inequality holds for c<(/><^ + cby Taylor expansion 
of logr(^) about (/> = 0; it holds for < </) < c since 

logr((/>) = logr(l + (/>) - log (/)< const +(log(/)~^)+; 

and it holds for (j)> + c since Stirling's approximation yields as — > oo, 

/(^,</>) < <^log(/. + 0(</)) < O(|0 - ^1=^). 

Thus 

Ee {I{O,Ok)-\{Ok-0?m) 

k=l,...,N 

< Y iA\Ok-of + i\oge,Ym<c)), 
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and since the hypotheses imply that the series converges, it suffices to show 
that 

Eel E {Ok-oA={logb)/[I{9)K{9)]+o{logb). 

\k=l,...,N ) 

Routine modifications of the arguments used to prove Lemmas 13, 14 and 
16 of Robbins and Siegmund [10] estabfish that 

EeN ^b/I{e,eQ) asft^oo 

and that by using the definition of 

Eei E {(^k-ef\^ E Ee{9u-ef^{\ogb)/[I{e)K{e)]. 

\k=l,...,N ) fc=l,...,[b/(9,eo)-i] 

□ 

Detecting a change. Now we suppose that when the process being moni- 
tored is in control, it yields i.i.d. Gamma-distributed observations, and when 
the process is out of control the shape parameter changes. Formally stated, 
an abrupt change may occur at time in which case Xi,X2, ■ ■ ■ ,X^-i are 
i.i.d. Gamma(^O) l)-distributed random variables and X^,Xyj^i, . . . are i.i.d. 
Gamma(0, l)-distributed random variables (which are independent of the 
first v — 1 observations). The initial shape parameter is assumed to be 
known, but the post-change parameter 9 as well as the changepoint are un- 
known. We will let Pjy and Ei, denote probability and expectation under this 
scheme, where 1^ = 00 denotes no change ever taking place. If the post-change 
9 were known, the Shiryayev-Roberts changepoint detection scheme would 
define the sequence of statistics i?^ = Efc=i,...,n fei^k, Xn)/feo{Xk, Xn) 
and the associated stopping time A^^ = m.m{n\R^ > A}. The sequence {i?^ — 
n} is a Poo-martingale with zero expectation, a structure used to evaluate 
the ARL to false alarm of N^. When 9 is unknown, we propose to estimate 
it in a way that will preserve the martingale structure. Again the idea is to 
substitute F„_i-measurable estimates for the 9 used in the likelihood ratio 
of X„. 

We present two examples. The first uses a method-of-moments estimator 
for 9, which leads to simple calculations and a correspondingly simple expo- 
sition of the issues involved in proofs. Our second example uses maximum 
likelihood estimation, which is asymptotically efficient but requires more 
calculation to apply and more delicate mathematical analysis. We provide a 
Monte Carlo comparison of the two methods in the next section. 
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Example {An SRRS procedure based on estimation by the method of 
moments). Given s,t > 0, define 

On,k=( E Xi + s]/{n-k + t) 

\i=k,...,n—l / 

for n> k, where O^^k = 00 if s A t = 0, 
(9) An,k= n [M,dX^)/feoiX,)], 

i=k,...,n 

Rn — ^ ^ An,k: 
k=l,...,n 

Na = min{n|i?„ > A}^ 

where the stopping threshold A is fixed. 

Results regarding the operating characteristics of this SRRS procedure 
are stated in Theorem 4, whose proof is given in the Appendix. Part (iii) of 
Theorem 4 illustrates the effect of the asymptotic efficiency of the estimation 
procedure on the delay to detection. 

Theorem 4. For a Shiryayev-Roberts-Robbins-Siegmund scheme de- 
fined by (9), 

(i) EooNa >AforaUO<A< oo, 

(ii) lim^_>oo £^oo-^yl/^ = I/Ti where 7 is the same as in Theorem 2, 

(iii) sup,Ee^,{NA-i^ + l\NA>u) = {logA + ^iloglogA)/K{e) + o{loglogA)}/Ee\og{MX)/fe,{X)), 
where k{9) = 1/(91 (9)) and 1(9) is Fisher information. 

Remark. Theorem 4 provides a basis for applying this Shiryayev-Roberts- 
Robbins-Siegmund changepoint detection scheme. If one requires an ARL to 
false alarm of at least B, one can obtain a (conservative) scheme by setting 
A = B, or, after evaluating 7, a scheme that approximately satisfies the con- 
dition by setting A = B^. It is possible to obtain asymptotic expressions for 
the expected delay to detection, but they have constants which have to be 
evaluated by Monte Carlo separately for each post-change parameter value 
9, and since the expressions do not yield good enough approximations for 
cases of applied interest {B of the order of magnitude 10^-10^), we do not 
present them here. 

Example {An efficient estimating sequence). Theorem 3 implies that 
better performance can be realized if the estimating sequence is efficient. 
In this subsection, we apply the (efficient) maximum likelihood estimator 
sequence instead of the method-of-moments type of sequence studied in the 
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previous two sections. In Theorem 5 we give a formal definition of the proce- 
dure and state results regarding its operating characteristics. Proofs appear 
in the Appendix. 

Let Q" be the measure on {Xi,X2, . . .} under which Xi ~ Gamma(0O) 1) 
and for n > 1 the distribution of Xn conditional on Xi , Xn^i is Gamma(0i^ , 1) , 
where On = the solution of {J2i=i n-i ^ogXi)/{n — 1) = Eg logX, which is 
easily seen to be the maximum likelihood estimate of 9 based on Xi, . . . , Xn-i- 

Theorem 5. (a) Under the measure , the sequence {9n} converges 
a.s. to a positive random variable whose distribution we denote by G. 
(b) For a Shiryayev-Roberts-Robbins-Siegmund scheme defined by 

On,k = solution of I logXij j/{n- k) = Eg log X 

\i=k,...,n—l / 

for n>k^ where 9k,k = ^O; 

Kk= n [fe,.,{X^i)/f0,{Xi)l 

i=k,...,n 

Rn — ^ ^ ^n,ki 
k=l,...,n 

Na = inm{n\Rn > A}, 
the following hold: 

(i) EooNa >AforallO<A< oo. 

(ii) liuiA-ioo EoqXa/A = 1/^ , where 7 = J rj{9) dG{9) [r]{-) is defined in 
Section 3]. 

(iii) sup, Ee,uiNA-i^ + l\NA >'^) = {logA + i(loglogA) + o(loglogA)}/^,log(/e(X)//eoW). 

Remark. For the maximum likelihood procedure, one can retain the 
flexibility of the method-of-moments produced by introducing the parame- 
ters s,t by defining 9n,k = solution of (s + J2i=k,...,n-i^^S ^i) / + n — k) = 
EglogX. Also, it may make sense to bound the allowable set of 0's to be 
bounded away from and from 00, and perhaps also from ^o- Although this 
may make the expected delay to detection inefficient for the truncated pa- 
rameter values, one can argue that they are not of practical interest, whereas 
their truncation will improve this scheme's performance for the retained set 
of parameter values. 

Remark. For all SRRS changepoint detection procedures designed for 
the case that the Poo-distribution is known, sup^, Eg^,{NA — v + ^\Na > 
v) is attained at u = 1. The reason for this is that the Pi,=j -behavior of 
{^k=j,...,n ^n,k}n=j,...,oo Conditional on -Fj-i is the same as the P,y=i-behavior 
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of {Efc=i,...,n^n,fc}n=i,...,oo- Let N^' = mm{n\n > j,J2k=j,...,nKi,k > A}. Clearly, 
E^=iNa = Ey=j{Nf -j + 1\Na > j}. However, i?„ > Y.k=j,...,nKk for all 
n>j. Therefore, A^^^^ > Na on {Na > j}, so that for all j > 1 

E,=^{Na - 1 + 1) = E,=iNa = E,=j{Nf -j + l\NA>j} 
>E^=j{NA-j + l\NA>j}. 

4. Monte Carlo. In this section we present a numerical illustration of 
the methods proposed in the previous section. We suppose that the pre- 
change distribution is standard exponential, that is, = 1- First, we con- 
sider the schemes defined by = 1, 6n+i = {nXn + t)/{n + t) (i.e., s = t) for 
i = 0,0.5,l. 

The first step is to evaluate the constant 7 (see Theorem 2). For each 
value of t, 5000 replications of exp{ — (log(AT-j,) — h)} were run for several 
ranges of b with {Xi\ distributed under the measure Q of Lemma 1*. The 
average of these replications is our estimate of 7. The results are summarized 
in Table 1. (See the Appendix for a detailed description of the method of 
simulation.) 

It seems that for 10 < 6 < 25 the expectation of exp{— (log(AT-^) — 6)} 
is nearly constant and hence presumably close to its limiting value. We 
obtain that 7 ~ 0.425, 0.547, 0.606 for t = 0, 0.5, 1, respectively. Next, we ran 
10,000 replications to calibrate the Shiryayev-Roberts-Robbins-Siegmund 
detection scheme to have 500, 750, 1000 as the ARL to false alarm. The 
results are summarized in Table 2. By Theorem 3, the ratio of A to the 
ARL to false alarm is asymptotically equal to 7, and, judging by Table 
2, the values of A seem to be large enough for the asymptotics to yield 
good approximations. In other words, setting A = 7 x (desired ARL to 
false alarm) will achieve the desired ARL to within very few percent. This 
makes Theorem 3 a practical tool: rather than calibrating A for each ARL 
separately (which is computationally demanding), it is enough to run a 
simulation to evaluate 7 (which takes just a minute or two) and multiply 
the result by the desired ARL. 



Table 1 

Monte Carlo estimates of j for three values oft, averaged over three 

intervals 



6-interval 


t 


= 


t = 


0.5 


t = 


1 


[Bo Bi] 


est. 7 


s.e. 


est. 7 


s.e. 


est. 7 


s.e. 


[10 15] 
[15 20] 
[20 25] 


0.4290 
0.4256 
0.4215 


0.0044 
0.0044 
0.0044 


0.5472 
0.5502 
0.5430 


0.0039 
0.0039 
0.0040 


0.6065 
0.6050 
0.6061 


0.0035 
0.0036 
0.0036 
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Table 2 

Levels A (evaluated by Monte Carlo) designed to achieve desired ARLs to false alarm 
and their relation to 7, for various values of t 



t 0.5 1 

ARL to false 

alarm 500 750 1000 500 750 1000 500 750 1000 

A 221 320 440 275 410 555 309 456 578 

A/ARL 0.442 0.427 0.440 0.550 0.547 0.555 0.618 0.608 0.578 

7 0.425 0.547 0.606 



Table 3 presents a comparison of the (maximal) expected delay to detec- 
tion of three methods-of-moments-based procedures of the kind described in 
Theorem 4 (s = t and t = 0,0.5, 1) calculated as the average of 10,000 run 
lengths to detection (when the change is in effect from the start) for each of 
the post-change parameter values 6 = 0.35, 0.5, 0.65, 0.8, 1.25, 1.5, 1.75, 2, 2.5, 3, 
for ARL to false alarm 1000. The differences are not dramatic, though the 
choice t = l seems to give overall performance slightly better than the others. 

Also included in Table 3 is a simulation study of the maximum-likelihood- 
based scheme. The maximum-likelihood-based scheme performs slightly bet- 
ter overall, though it has larger delays to detection when the post-change 6 
is less than 1 . The calculation of the many maximum likelihood estimates re- 
quired to perform the SRRS procedure is of course considerably slower than 
the calculation of the method-of- moments estimators. For each k and n the 
estimate is obtained by solving numerically for the value of 9 such that 
r'(6')/r(6') equals the average of log(Xfc), . . . ,log(X„). 

A central question to be answered is how well do the procedures proposed 
here compare with simple schemes. For example, since the problem we have 
been considering is a two-sided problem (the post-change value of may be 
either larger or smaller than ^o)) a- simple alternative method is to choose 
two values < < < ^2 < 00, put a prior probability of 50% on 61 and on 
62, and apply the corresponding Shiryayev-Roberts rule; that is, the control 
statistic is 

R{n) = ^Re,{n) + lRe,{n), 

where Rq. (n) are the usual simple Shiryayev-Roberts statistics designed for 
detecting a change from 60 to 9j {j = 1,2); that is, 

Re^{n)= J2 ^k,e,{n), j = l,2, 

k=l,...,n 

where 

n 

Afc,e,H = l[[x^^~'"m)/rie,)], j = 1,2, 

i=k 
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and the stopping time is 

Sa = min{?i|i?(n) > A}. 

Based on 10,000 runs, we calibrated A to yield ARL to false alarm 1000 
for each of the three pairs {61,62) = (0.8,1.25), (0.65,1.5), (0.5,2) (with 
^0 = 1)5 and ran 10,000 simulations of delay to detection (when the change 
is in effect from the start) for each of the post-change parameter values 
61 = 0.35,0.5,0.65,0.8,1.25,1.5,1.75,2,2.5,3, as above. The results are in- 
cluded in Table 3. Not surprisingly, the farther apart 61 and 62 are, the 
shorter the expected delay to detection for extreme values of the post-change 
parameter and the longer the delay for values close to ^o- Of the three ex- 
amples chosen, the one with {61,62) = (0.65, 1.5) seems most similar to the 
SRRS {t = 1) scheme, which for the values of 6 chosen requires about 15% 
longer delay for 6 near 6q and about 10-30% shorter delay for extreme values 
0(6. 

Finally, in Figure 1 we present histograms of the distribution G of the limit 
as n 00 of 6n under the measure Q (for t = 0, 0.5, 1). The intervals on the 
horizontal {6) axis have width 0.1. This G can be regarded as a "natural" 
prior on the post-change 6, which could have been used as a mixing measure 
had mixtures been technically feasible. 

It is important to note that we are not trying to make a case for the SRRS 
procedure as the method of choice for the specific problems considered in 

Table 3 













P-M 


P-M 


P-M 


Procedure 


t = 


t = 0.5 


t = l 


MLE 


(0.8,1.25) 


(0.65, 1.5) 


(0.5,2) 


True Q 
















0.35 


10.2 


9.2 


9.5 


9.6 


15.4 


10.1 


8.2 


0.50 


18.9 


17.6 


17.7 


18.2 


25.4 


17.5 


15.3 


0.65 


40.2 


38.0 


37.2 


38.6 


43.7 


33.6 


33.4 


0.80 


112.7 


104.9 


101.6 


108.5 


94.9 


94.0 


122.3 


1.25 


107.6 


108.4 


105.9 


98.2 


93.2 


94.4 


150.3 


1.50 


40.8 


41.6 


41.1 


39.4 


48.0 


36.0 


40.1 


1.75 


23.6 


24.3 


24.3 


23.3 


34.8 


23.6 


20.5 


2.00 


16.6 


17.0 


17.1 


16.3 


28.7 


18.5 


14.2 


2.50 


10.2 


10.6 


10.8 


10.3 


22.4 


13.8 


9.6 


3.00 


7.5 


7.8 


8.0 


7.6 


19.3 


11.6 


7.6 


A 


440 


555 


578 


632 


838 


700 


565 



Monte Carlo: Each cell 10,000 runs; B = ARL to false alarm = 1000. (Worst) Average 
delay for various procedures and various values of (true) a. 
Estimation: 6l„^fc = {J2t=k,k+i,....n-i ^» + t)/{n - k + 1). 
Pair-Mixture: (a*, a**); P{a = a*) = P(q = a**) = 1/2. 
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200 



100 




12 3 4 

Fig. 1. Histograms of simulation ofG. 



this section and the preceding section. Rather, these sections are meant to 
introduce the SRRS schemes, show how to apply them, illustrate issues re- 
lated to proofs of their asymptotics and indicate that they can be effectively 
used in situations where mixtures are difficult to handle, resulting in at 
most a slight decrease in efficiency. For single-parameter problems like the 
Gamma shape parameter, even simpler procedures may show quite accept- 
able performance (though not asymptotic efficiency), as illustrated by the 
simulation results for well-chosen mixtures of two simple Shiryayev-Roberts 
statistics. However, in multiparameter problems such approaches lose their 
simplicity and transparency, and mixtures are often intractable, in which 
case the SRRS approach offers worthwhile advantages. 

5. A multiparameter example. Consider a situation where one monitors 
simultaneously m independent processes, the ith of which yields indepen- 
dent Gamma(0o = 1, A) -distributed observations when the processes are un- 
der control {Pi are known), and, either by design or by the nature of the 
problem, the observations are taken one m-vector at a time, the first being 
observations taken simultaneously from the processes 1,2, ... ,m, the second 



NONANTICIPATING ESTIMATION 



19 



starting with a second set of observations from the processes 1, . . . ,m, and 
so on. A change, if it takes place, may affect some or all of the parameter 
values, which may be different for the different processes. 

For illustration's sake, imagine that the observations are taken one a day, 
and, when everything is under control, the distribution of an observation is 
exponential with a parameter that depends on the day of the week in which 
the observation was made. After standardization, all of the observations have 
a Gamma(^o = li 1) distribution (pre-change). If there is a changepoint, then 
subsequent observations are assumed to have a Gamma(0, 1) distribution, 
where the post-change shape parameter depends on the day of the week. In 
other words, changes may be in the 9 value for one of the days, for some 
of them, or even for all of them, and the post-change parameter values may 
differ for different days. (The observations are all assumed to be indepen- 
dent.) We assume that the observations are obtained weekly, and a change 
may take place only between weeks. Here m = 7, and the observations are 
vectors Xj, where Xij denotes the observation on the j'th day of the ith 
week. For a method-of-moments approach, define 

GnAJ = ( J2 Xij + s]/{n-k + t) 

\i=k,...,n—l / 

for n>k, where ^fc,fc,j = = 1 if s A t = 0, 
Kk= n [fe,^.AX,,)/fe,{X,,)], 

i=k,...,n;j=l,...,7 
k=l,...,n 

Na = min{n|i?„ > A}. 

For a maximum- likelihood approach, take 9n,k,j = solution of EelogX = 
{s + Y.i=k,...,n~i^ogXij) / {n - k + t) for n>k, and ek,k,j = Oo = l. 

Analogs of Theorems 4 and 5 are valid, the only change being that 7 has to 
be recalculated (by Monte Carlo, in a manner analogous to Section 4). The 
application of the schemes is straightforward, requiring a short computer 
program. 

In this example, even a discrete mixture is impractical. The simplest 
reasonable choice would put a prior of 1/3, 1/3, l/3on^o = l and some 
di and 02, independently for each of the seven days of the week, lead- 
ing to a discrete mixture of 3^ = 2187 points [deleting, perhaps, the point 
{60, 9o, . . . , ^o)]- (One needs at least three ^'s to allow for the fact that there 
may be a change in only a subset of the parameters.) Schemes that put 
weight on more than three 0's are even more cumbersome. Schemes that 
reduce the number of points will be inefficient for detecting certain constel- 
lations of change. On the other hand, the SRRS scheme is intuitive and fairly 
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easy to implement. Furthermore, it has the flexibility of easily accommodat- 
ing prior knowledge of the region where post-change parameters may be. 
For example, if the only possibility of a change is for the shape parameter to 
increase, the estimator can be restricted to 9 > 6q. Or, if for some reason it is 
clear that the post-change shape parameter will be larger on weekdays than 
on weekends, then the estimators may be modified to reflect this. This may 
be a reason to consider SRRS schemes even in problems where in the nonre- 
stricted settings mixtures are feasible; if restrictions are added, integration 
with respect to a conjugate prior may prove to be much less tractable than 
in an unrestricted context. 

6. Remarks. 1. Intuitively, one would expect the Robbins-Siegmund type 
of rule to do somewhat worse than its mixture counterpart. For example, in 
the Gamma shape parameter problem, if one takes s = 0, then the parameter 
value used for the first likelihood ratio equals the initial pre-change param- 
eter value, so that the (estimated) likelihood ratio of the first observation is 
unity no matter what the value of the first observation is. In other words, 
one "loses" an observation, something which does not occur when employ- 
ing the mixture analog. The decision whether or not to stop sampling at the 
nth stage is based on a sufficient statistic under the mixture rule but not 
under the Robbins-Siegmund rule. Nonetheless, the latter method seems 
to perform nearly as well as the mixture rule, as indicated by Theorem 2 
and the discussion at the end of Section 2. We ran a simulation to com- 
pare the SRRS and its mixture analog for detecting a change of a normal 
mean. (Here mixtures are easy to implement; we just wanted to see how 
the methods compare in a "standard" context.) We assumed the variance 
of the observations to be 1 and the pre-change mean to be 0. By numerical 
calculations of the variance of Gs,t of Section 2, Gs=o,t=OA262G = -^(0, 1). We 
constructed the SRRS scheme (with s = 0, t = 0.42626) in the same manner 
as is done in (9) for the Gamma example and compared it by simulation to 
the mixture rule with a A^(0, 1) mixing measure. The results are recorded in 
Table 4. (The case /U = gives the simulated ARLs to false alarm. For ^ > 
1/ was taken to be 1.) 

Table 4 indicates that the time to detection is, as to be expected, some- 
what longer for the SRRS rule, but the difference is not very great. For 
/i = 0.25 the difference is insignificant, and for the other values of /i that 
were investigated, over the range fi = 0.5 to /i = 3 there is a remarkably 
consistent pattern: the ARLs of the SRRS rule are about 0.4 or 0.5 larger 
than those of the mixture rule. (The ARLs to false alarm for the two rules 
are roughly equal for each A in the range investigated, with the mixture 
rule having an ARL about 1-2% larger than the SRRS. This difference is 
small, and adjusting for it hardly changes the picture. The overshoot effect 
is ^ooiVAM«l/7~1.5.) 
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2. Although we conjecture that the SRRS scheme is never better than its 
mixture analog, the following example indicates that it is in some cases no 
worse. 

Let Xi ~ Binomial(l,p) i.i.d.; Hq -.p = pQ, Hi:p ^ pQ. Suppose pi = s/t, 
Pn+i = (s + Z]i=i n'^i)/{^ + where < s < t < oo are constants. Note 
that the behavior of the sequence Xi , X2 ,■■ ■ [with the conditional distribu- 
tion of Xn given the past being Binomial is identical to that of the 
sequence Xi,X2, ■ ■ ■ , where, conditioned onp, the Xi are i.i.d. Binomial(l,p) 
variables and there is a Beta(s, t — s) prior on p. In this setting clearly Pn^V 
a.s. as n ^ 00. Hence G is Beta(,s, t — s) (the same as the prior on p). In this 
example SRRS is identical to its mixture counterpart. 

3. When there is a suitable invariance structure, the Robbins-Siegmund 
technique can be applied also when the baseline is unknown. To illustrate 
this, consider again the change of normal mean problem as in Remark 1, 
but suppose that the initial baseline (the pre-change mean) is unknown. In- 
variance considerations would base changepoint detection on the sequence 
{Yi\ = {Xi — Xi} instead of the original sequence {Xi} (see [7]). The un- 
known post-change parameter EYn. can be estimated by (Y^ H hl^-i)/ (n — 

k). 

4. Usually there is no obvious natural prior for a mixture rule, whereas 
there are natural estimates. At least in theory, in such cases the estimates 
can be regarded as inducing a natural prior. For instance, in the example 
treated in Section 2, if X is considered to be a natural estimate of the mean, 



Table 4 





SimuL 


ated ARLs for 


detecting 


a change of 


a normal 


mean, 40,000 runs 




A 


fj, = 


fi = 0.25 


/X = 0.5 


fj. = 0.75 


fX=l 


H = 1.5 


/x = 2 


/X = 3 


400 


587 


104.8 


38.5 


20.9 


13.57 


7.55 


5.11 


3.18 




599 


104.7 


38.1 


20.5 


13.13 


7.11 


4.68 


2.73 


450 


661 


109.2 


39.6 


21.3 


13.82 


7.66 


5.17 


3.21 




673 


109.0 


39.1 


20.9 


13.38 


7.22 


4.74 


2.76 


500 


739 


113.0 


40.5 


21.7 


14.05 


7.76 


5.23 


3.24 




748 


112.9 


40.1 


21.3 


13.60 


7.32 


4.80 


2.78 


550 


813 


116.6 


41.3 


22.1 


14.25 


7.84 


5.28 


3.26 




823 


116.3 


40.9 


21.7 


13.80 


7.41 


4.85 


2.81 


600 


886 


119.7 


42.1 


22.4 


U.U 


7.93 


5.33 


3.29 




900 


119.7 


41.6 


22.0 


13.97 


7.49 


4.90 


2.83 


650 


961 


122.8 


42.8 


22.7 


14.62 


8.00 


5.37 


3.31 




973 


122.7 


42.3 


22.3 


14.14 


7.57 


4.94 


2.85 


700 


1037 


125.5 


43.4 


23.0 


14.77 


8.07 


5.41 


3.32 




1052 


125.4 


43.0 


22.6 


14.30 


7.64 


4.98 


2.87 


s.e. 


0.43 


0.35 


0.11 


0.05 


0.03 


0.014 


0.008 


0.004 



(For each A, first row — SRRS, second row — mixture.) 
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then s = t = and Gs=o,t=o = A^(0,7r^/6). So one can argue that a natural 
mixture rule is one with a iV(0,7r^/6) prior on fi. 

5. For a practical application, it is not imperative to prove an analog 
of Lemma 1 (though its validity is an ingredient in ensuring asymptotic 
optimality). Heuristically, the overshoot correction can be expected to be 
nearly a constant function of A once A is reasonably large, and the constant 
can be estimated by the simulation methods discussed in Section 4. 

6. Another approach to evaluation of 7, the limit value of the overshoot 
correction, has been proposed by Yakir and Pollak [19]. That method has the 
potential to allow error estimates, but in the problem of detecting a change 
in 9, the shape parameter of the Gamma distribution, the error estimates 
proved to be difficult to apply. 

7. Deleting the last observation from use in the estimation process pre- 
serves the martingale structure of the Shiryayev-Roberts statistic. Initially, 
it seems unnatural to exclude the last observation: after all, this seems to en- 
tail a slight loss of efficiency and foregoing sufficiency. The following example 
shows that there is more involved than mere preservation of a mathematical 
(martingale) property: at least in the hypothesis testing case, inclusion of 
the last observation in the estimation sequence can wreak havoc with the 
level of significance. 

As in Section 2, consider a power one test of Hq-.Xi ~ A^(0, 1) versus 
Hi : Xi N{fi, 1), where Xi are i.i.d. random variables and 7^ G (—00, 00). 
Here the likelihood ratio is A„(^) = exp(/i^j^;^ — n/i^/2). If one sub- 

stitutes the maximum likelihood estimator X]j=i,...,n^j/^ /^i then the 
stopping rule based on the estimated likelihood ratio becomes Na = min{n|A.„(X]i=i,...,n ^i/^) ^ 
A} = min{n|| X;j=i,...,n^il > [2nlog{A)]^/^}. The law of the iterated loga- 
rithm implies that Phq{Na < 00) = 1 regardless of the value of A, so that 
one loses the capability of having a nontrivial probability bound on the level 
of significance. 

One stands to lose even if one uses the nth maximum likelihood estimator 
for the nth likelihood ratio only — that is, write A„ = expj^j^j^ ^(/ijXj — 
fJ.i/2)} with Hi = J2j=i,...,i^j/h and define Na = min{n|A„ > A}. One can 
show that here, too, Phq{Na < 00) = 1 regardless of the value of A. (Sketch 
of proof: Show that Ehq log A„ = ^(logn)(l + o(l)) and Var//„ A„ = |(logn)^(H- 
0(1)) as n ^ 00. Argue that for < e < 1, asymptotically P//,j{logA„ > 
e X ^logn} > 6 for some 6 > 0. Then break the time axis into intervals 
[l,ni], [ni + l,n2], [n2 + l,n3], . . . large enough so that logA„. are "almost" 
independent, and conclude that for any fixed A, P/^gjA^ > A for some 1 < 
n < 00} = 1.) 

This phenomenon is not as marked in the changepoint detection context. 
See [15]. 

8. In the multiparameter case, our methods are more flexible than indi- 
cated. For example, reconsider Section 5. Our methods can be designed for 



NONANTICIPATING ESTIMATION 



23 



the case that observations are taken on a daily basis, and the change may 
occur between days rather than only between weeks. 

Let the observations be labeled Xi, where i is number of days since the 

onset of changepoint detection, and define On.k = (s + Xn-7 + Xn-u H h 

Xn-7r)/it + r) [or define ^n,fe,j = solution of EglogX = (s + logX„_7 + 
logX„_i4 + • • • + logXn~7r)/it + r)] where r = r(n, k) = [(n — k)/7\ and 
^n,kiRn and Na are as in (9). 

Here, too, analogs of Theorems 4 and 5 are valid, with 7 having to be 
recalculated (by Monte Carlo, in a manner analogous to Section 4). The 
application of the schemes is straightforward, requiring a short computer 
program. 

9. The argument used to prove Theorem 5(a) can serve as a model for 
dealing with many similar problems. The essential ingredients are that the 
estimators 0*^^ satisfy an equation of the form 

Ee*^^T[X) = [r(Xi) + • • • + T{Xn)]ln 

and that an analog of (15) holds, that is, 

Vare T{X) <a + b{EgT{X)f for all 9. 

7. Conclusion. We propose that the SRRS scheme is an efficient detec- 
tion scheme, and should be useful wherever mixture rules are desired but 
hard to implement. The construction and application of an SRRS rule is 
simple: all one needs is a sequence of (preferably efficient) estimators for 
the post-change parameter based on the first n — 1 observations. Each such 
estimator will be used to construct an estimated likelihood ratio of the nth 
observation. The likelihood ratios are used to construct a Shiryayev-Roberts 
statistic, as done in Section 3. In order to achieve an ARL to false alarm B, 
a conservative rule will stop and declare a change to be in effect when the 
Shiryayev-Roberts statistic first exceeds B. A rule that attains B approx- 
imately as its ARL to false alarm will stop and declare a change to be in 
effect when the Shiryayev-Roberts statistic first exceeds A = Bj. The con- 
stant 7 has to be evaluated (usually) by simulation of tests of hypotheses as 
in Section 4, but this is the only simulation required, and it takes very little 
computer time. 

APPENDIX 
A.l. Sketch of proof of Theorem 4. 

Sketch of proof of Theorem 4(i). Note that {i?„ - n} is a Poo- 
martingale with zero expectation, and by the optional sampling theorem 
Eoo{Rna ~ ^a) = 0. Since by definition Rna ^ this implies that EoqNa = 
Eoo{Rna) ^ ^> which establishes (i). 
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Sketch of proof of Theorem 4(ii). We follow along the lines of 
the proof of Yakir [18], Theorem 3 (and Theorem 1). Before introducing 
notation in (11) below, we sketch the ideas of the proof. 

Break up the time axis (the positive integers) into pieces of size m, and 
show that the Poo-distribution of Na can be approximated by using the 
distribution of the first block (of m observations) where stopping occurs. 
More precisely, given an integer j, the idea is to define (where A\B denotes 

and to show that 

(10) (1 - e)^m/A < Poo{Sj,^\Sj,m) < (1 + £)jm/A. 

This enables approximation of by 

m X {a Geometric (p = 7m/A)-distributed random variable}, 

from which E^oNa ~ m/['^'m/A) = A/^ follows. 

In order to carry out this program, it turns out that one needs 

log A < ?tK ^4. 

The key ingredient for proving (10) is a measure transformation that will be 
shown to yield 

-Poo(5'o,-m|So,m) = Poo{Na < m) 

= Ek{exp{- (log Rn ^ - log A)}; {k<NA<m})/ A. 

k=l,...,m 

Since log A <C m, for "most" fc's Pk{k < Na < rn} ~ 1. Also, a renewal- 
theoretic argument will show that the asymptotic {A — > oo) behavior (under 
Pk) of (logi?jv^ — log A) is the same as that of the log- likelihood ratio statistic 
in the context of the power one test. Therefore, -Poo(5'o,m|'5o,m) ~ rwy/A. The 
argument is extended to general -Poo('S'j,m|'5j,m) by induction on j. 

In order to make the analogy to Yakir [18] more transparent, note that 
the Gamma(0, 1) family can be transformed into an exponential family with 
canonical form: if X ~ Gamma(^, 1), then a reparameterization y = 9 — 9q 
and an appropriate affine transformation X* of logX yield a family of prob- 
ability measures of X* with densities 

fy{x) = exp{yx - 4'{y)}fo{x), y G (-6*0, oo), 

where ■0(0) = V''(0) = 0- With this notation, the estimator y{n,k) (for the 
parameter of the nth observation under the putative v = k) dictated by (9) 
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IS 



y{n,k)=i Xi + s\/{n-k + t) 

\i=fc,...,n— 1 / 



for n>k, where yk,k = if s A t = 0. 
Roughly emulating Yakir's notation, let 
Zf=yX*-^P{y), 

Vi=k,...,n ) 

R„= Y expj E zf''A= Y dPf-'^'^/dP^, 

k=l,...,n Ki=k,...,n ) k=l,...,n 

Na =mm{n\Rn > A}, 
a = log A, 

M(yl) =minJ n| ^ zf''^'^ > log a\ =ta, 

[ i=l,...,n J 

Qk = measure analogous to the Q-measure of the proof of Theorem 1, 
appropriate to the Gamma(^,l) context dictated by (9), applied 
(11) toX*,X*^„..., 

H = asymptotic distribution (under the measure Qk=i) 
of the overshoot 

i=l,...,TA 



j exp{— x} (i-fr(x). 
We obtain an analog of Yakir's [18] Lemma 1: 

Lemma Y1. Let in = m[A) he a sequence that satisfies 

A/m^ CO and (logA)/m^O as A ^co. 

Then 

(12) Poo{NA<m)/Poo{M{A/m)<m)^l asA^oo. 

Sketch of proof. For any stopping time N , 
Poc{NA<m)= V / dPoo 
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j=l,...,m k=l,...,j 



fc=l,...,m 



{k<N<m} 



Let r(n) = \og Rn- Now 



(13) 



exp{-(r(iVA)-a)}dQfc/A 

{fc<7VA<m} ■" J{fe<AfA<m} 

By an argument analogous to the proof of Theorem 1, the denominator in 
(12) can be shown to be 

P^{M{A/m) <m) = {jm/A){l + o(l)), 

since m grows faster than log A. Therefore, it will suffice to show that, for 
most of the fc's, the value of the integral on the right-hand side of (13) is 
approximately 7. Note that 



R 



k—l+n 6 



fe,...,fc-l + n » 



■i=i,...,fe-i 



-j....,k — l '^i g /^i — k,...,k—l + n^'^i i ' _|_ ]_ 



+ E - 

j=fc+l,...,fc— l+n 



i — k,....j — l i f3jL^i=j,...,k — l + n^ i i ' 



y(i,i) yy{i,k)-. 



so that 

r{k — 1 + n) 



E ^ 

i=fc,...,fc— 1+n 



y{i,k) dcf 



log[Wo(A:,n) + l + VFi(/c,n)]. 



Observe that for i > c> 6 

y{i,c) -y{i,h) 

= ( E ^« + 5j/(i-c + t)- ( ^ Xu + s\/{i-b + t) 

\u=c,...,i—l J \u=b,.. .,1—1 J 

= {c-b)s + {c-b) Xu-{i-c + t) ^« 

X [(i-c + t)(i-6 + t)]-S 

and argue that Wo{k,n) ^ Wo{k,oo) andWi{k,n)^Wi{k,oo) a.s. (Qk) as 
n — > 00, where both Wo{k,oo) and VKi(A:,oo) are a.s. (Qk) finite. Also note 
that for r > 0, ui > 0, -^2 > 0, 

I log(r + 1 + ui) - log(r + 1 + M2)| < I log(l + m) - log(l + M2)|. 
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These relations imply that Theorem A. 7 of Siegmund [14] applies, uniformly 
in k and in the value of Rk-i- So (nonlinear renewal theory implies that) 
the overshoot r(iV^) — a, given the value of Rk-i, has the same asymptotic 
distribution as the overshoot of the random walk J2i=i,...,M{A) ' ■ 

An argument identical to that of Yakir ([18], first half of page 276, ver- 
batim), after replacing Yakir's R{j,y), r{j,y), N{A,y), dPj^ and 7 by Rj, 

r{j), Na, dPj!^^'''^ and 7, completes the proof of Lemma Yl. With the same 
replacements, the rest of the argument of Yakir (verbatim, from the mid- 
dle of page 276 until the end of Section 2 on page 278) accounts for our 
Theorem 4(ii). 

Sketch of proof of Theorem 4(iii). One has to check that the 
conditions of Theorem 3 are satisfied. It is straightforward to check that 
Ee{On — ^)^ = 0(l/n^). As for the other condition, take c = 6/2, write m = 
n — 1 and note that 

Ee{\og+{e-'))l{9n<c) 

<log{n-l + t)Pg{dn<e/2) 

xll Xi + S<n-l + t\ll Xi + S<l], 

\ j=l,...,n— 1 / \i=l,...,m I 

and, since | logx| < 1/x for < x < 1, 

-Ee{\og[^ Y Xi + syji{en<e/2) 

xll Y Xi + S<n-l + t]ll Y Xi + S<l] 

\j=l,...,n— 1 / \i=l,...,m / 



< J {l/x){x^'^-^/r{em))e-'' dx 
= (l/(6'?n))P(Gamma(6lm - 1, 1) < 1). 
Apply standard manipulations of the Gamma distribution to obtain 
Y Ee(log+{e-'))l{en<9/2)<<x>. 



n=l,...,m 
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A. 2. Sketch of proof of Theorem 5. Once (a) is proved, the rest follows 
along the same lines of the proofs in the cases considered in the former 
sections. 

Proof of Theorem 5(a). Let Yn = Yl,i=i,...,nO-OgXi)/n and note that 

^Q"(logX„+i|F„) = logX = Yn. 

Because {Yn} is a Q'-martingale, general martingale considerations imply 
that VarQ-y„ increases in n [since < EQ'EQ'{{Yn+i-Yn)'^\Fn) = VarQ-y„+i - 
Mar Q'Yn]. Since 

i?Q-[(logX„+i - Ynf\Fn] = Vare;^^ logX, 

by writing 

Yn+i = Yn + (logX„+i - y„)/(n + 1), 
squaring both sides and taking conditional expectations one obtains 

(14) EQiY^^,\Fn) = Yl + [Vare;+, logX]/(n + 1)^. 

For X ~ Gamma(0, 1), one obtains by standard considerations that 

dEQ\ogX^-\ and Var^logX ^ 1 as 6* ^ 0, 

so there is a 6** > such that VarglogX < 2(Ee\ogXf for B < 9*. Also, 
since ^VarglogX ^1 as — > oo and Var^logX is continuous in 9, there is 
a constant c such that YaiglogX < c for 9 > 9*. Thus 

(15) VarelogX <c+2{EglogXf for ah 6. 
Combining (14) and (15) and taking expectations, one obtains 

i?Q-i;?+i < Eq~Y^^ + Egic + 2{Ee~„^, \ogXf]/{n + 1)^ 

= Eq~Y^ + EQic + 2EQ'Yl\l(n + 1)^ 

= (l + 2/(n + l)2)i^Q-y„2 + c/(n + l)2. 

Since Varg- Yn was shown to be an increasing sequence, 

Eq'YI > VarQ- Yn > Varg- Yi > c/6 

for some 6 > 0, and hence 

Eq'Y^+, < [1 + (2 + 6)/in + lf]EQ~Yl 

This shows that {Eq-Y^^-^^} is bounded, since the infinite product Yln=i 2 ... + 
(2 + 5)/(n + 1)^] converges. Hence {VarQ-1^} is bounded (and, being mono- 
tone, is convergent). It follows that the martingale {Yn} has an a.s. {Q") 
finite limit, and consequently {S^+i} has a finite positive limit. 
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Sketch of proof of Theorem 5(b) (i). As in previous cases, — n} 
is a Poo-martingale with zero expectation, and Theorem 5(b) (i) is estabhshed 
by the optional sampling theorem. To see that the conditions of this theo- 
rem are met, it suffices to show that A^^ is bounded from above by const xa 
geometrically distributed random variable. Note that 

n 

A„ fc > An,n-1 = An"'" ^ °r{0o) /T{6n^n-l) = £,n- 

k=l 

Since ^n,n.-i depends only on A„_i, clearly depends only on A„„i and Xn- 
Therefore, ^2,^4)^65 • • • are i.i.d. random variables under Poo, and thus Na < 
min{n|n = 2m, ^ A}, which is 2 x a geometrically distributed random 
variable. 



Proof of Theorem 5(b)(ii). The proof follows along the same lines 
as that of Theorem 4(ii) and is therefore omitted. 

Sketch of proof of Theorem 5(b)(iii). It suffices to show that the 
conditions of Theorem 3 are met. 

Let Civ) = Ey\ogX = [dT{y)/dy]/T{y), and let C"^ denote the inverse 
function of C,. Observe that C,{y) = logy + 0(1) as y ^ oo, that C'{y) = 
dC,{y)/dy = Var^^logA = (1 + o{l))/y as y — > oo and that Var^logX is a 
decreasing function of y. Therefore, since dC~^{t)/dt = l/C'{C~^{t)), there 
exists a finite constant ci > such that for t > Eq^ log A 

^<C\t)-C\Eg,\ogX) 

(16) <{t-Ee,\ogX)dC\t)/dt 

< {t-Ee,\ogX)cie\ 

For p > 0, let (5 > eg. Since On+i = C"^ logA ) (where logA = ELi log ^ih) > 

(17) r& foo 

= PeMn+i-Oo\''>t)dt+ PeM+i-eo\''>t)dt. 
Jo Js 

The standard derivation of the asymptotic distribution of the maximum 
likelihood estimator coupled with large deviation analysis yields that 

(18) r PeoiK+i - >t)dt = {l + o{l))E\N{0, l)r(7i/(0)At(0))^^/l 







As for the second integral in (17), let s > 0. It follows from (16) that there 
exists a constant C2 > independent of 6 and s such that 



poo 

/ Pe,{\en+i-^o\''>t)dt 

J 5 
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< 



/ Pe,{{ log X - Ee, log Xfe^'^^^'f?^ > t) dt 

poo 

= / PooiilogX- Ee,, iogX)M(^°s^--^»oi°s^)+^«oi°s^]c? > t)dt 

J 5 

poo 

< / Pe,(logX>^e„logX + C2logO(it 
= Pe, (^(^f[ X,^ > e^^oo ^°sx+c2 iogt)ns^ 

roc 

J s 

s-{c2ns-l) 

C2ns — 1 

Combining this with (17) and (18), after setting 6 large enough so that 

{EeoX')/[6''^'e''^<'o^°^^] < i 

yields 

Eeje^ - 0or = (1 + o{i))E\N{o, i)\p{ni{e)K{e)r^^\ 

Thus the first two conditions of Theorem 3 (with p = 2 and p = A) are 
satisfied. 

It is left to show that for some c > 
(19) Ee,[{\og{{9n)-^))'^t{9n<c)]<^. 

n=l,...,oo 

Since C,{y) = — (1 + o(l))/y as y ^ 0, it follows that there exists a constant 
< c* < 1 A 6*0 such that, for any < c < c*, logX < on {dn+i < c}, a nd 
for any such c there exists a constant C3 > such that 9n+i = C~^(log-'^) > 
C3/(— logX) on {6*^+1 < c}. The closer to zero that one chooses c, the closer 
to 1 one can set C3. Choose < s < ^0 and c and define C4 to be such that 
ca/c > [1 + T{9o - s)/T{9o)]^/' = C4. Now 

Ee,[iiogm+ir'))^m,^i<c)] 



(20) <ii;,J[(-logC3)+ + (log(-logX))+]l(0;+i <c)} 

< (-logC3) + P0o(^n+l < c)+^eJlog(-logX) + l(e;+i < C)}. 

Since c<9q, standard considerations of large deviation analysis yield that 
Pg^^{9n+i < c) is exponentially bounded, so that J2n=i 00 -f^o (^"+1 < c) < 
00. As for the last term in (20), recall that < s < and note that 



ii;,jiog(-iogX)+i(^;+i<c)} 

< / Pe,{log{-logX)>t}dt + {logCi)Pe,{9n+i<c). 

J log C4 
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The last term on the right-hand side sums to a finite result by the same 
argument just given, and the integral equals 

/ Pe,{- log X>e'}dt 

J log C4 

= / PgJexpi slog(l/Xi) ) >exp(sne*) Idt 

Jlogci [ Vi=l,...,n / J 



^ (j^e„exp(.log(l/X)))" ^^ 
~ /iogc4 exp(sne*) 



< ( l^gi—l^ ) / e-'^^'dt 









' J log C4 


r{9o-sy 




r(^o) . 


' J log C4 


r{9o-sy 


n ~sn 


r(0o) . 


I sn 



from which (19) follows using the definition of C4. 

A.3. Description of Monte Carlo. Let V'(^) = -E^exp{ — (log(AT-J — b)}. 

We wish to estimate 7 = liuif,-, 00 fp{b) by simulating exp{ — (log(AT-j) — 6)} 
r times for a large b and averaging the results. It is obviously efficient to use 
the same simulation runs to estimate ip{b) for a chosen set of values bi, . . . ,bk- 
We want to choose large 6j's, large enough so that the simulation results 
ilj{bi) are approximately equal, in which case it is reasonable to assume that 
they are close to the limiting value 7. The accuracy of simulation results is 
improved by "averaging over intervals of 6- values" : define 

i^{Bo,Bi)= ^l:{b)db/{Bi-BQ) 

''\ygiexp{-m})db/iBi- Bo), 

Bo 

where (,{b) denotes the overshoot of {logA„} over b, and "avg" denotes the 
(sample) mean of the results for the r simulation runs. 

Interchanging the operations of integration and averaging, 

{Bi-Bo)Jj{Bo,B,) = ayg(^l^\eKp{-m})db 

Consider the ladder variables (successive "record values") of the process 
{log A.„}, and define for a given run Tq = i?o < Ti < r2 < • • • < Tm~i < Bi < 
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as the m>l ladder variables in {Bq,Bi\ and the first one overshooting 
Bi. Then 



where oq = Tq = Bq, ai = Ti, . . . , arn.--i = ^m-i and = Bi. It is easy to 
calculate the integrals, yielding 



{Bi - So)V(5o, Bi) = avg ^(1 - exp{ri_i - TJ) + exp{5i - T^} - 1 



This formula is easily applied by accumulating on each simulation run the 
terms coming from the "ladder steps," Fj — Fj-i, and using also the values 
Tm — Bi when Bi is first exceeded, ending the run. 

For each of the cases t = 0,0.5,1, three simulations of r = 5000 runs 
each were performed, using [Bq^Bi] = [10, 15], [15, 20] and [20,25]. The re- 
sults, shown in Table 1, indicate that 5 > 10 (corresponding to q < e~^^ ~ 
1/22,000) is large enough in the present example to provide a stable estimate 
of 7. 

The simulation runs were truncated after nmax= 50,000 observations 
when [Bq^Bi] = [10,15], and after 75,000 and 100,000 observations in the 
other two cases. In 1-2% of the runs, the boundary Bi was not reached 
before truncation (due to 6 estimates close to the null value, = !)• In niost 
of these instances, Bq was not reached either. In the latter cases, the results 
for those runs were divided not by Bi — Bq but by "the largest observed 
ladder variable" — Bq. When Bq was not reached, the value 1 was used as 
the output of the run in computing the averages over the r runs. Both of 
these adjustments seem appropriate and have a small effect on the tabulated 
results, presumably causing a very slight positive bias of the estimates of 7, 
much smaller than their standard errors. 

The simulations reported in Table 3 were speeded up using linear inter- 
polation in a table of 30,000 values of the maximum likelihood estimator 
over the range [—10, 10] for the average of the logX's, a tactic which should 
not be needed when the SRRS procedure is applied to a single observed 
sequence. 
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